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Abstract 

We perform a systematic analytical study of finite size effects in separable recurrent 
neural network models with sequential dynamics, away from saturation. We find two types 
of finite size effects: thermal fluctuations, and disorder-induced 'frozen' corrections to the 
mean-field laws. The finite size effects are described by equations that correspond to a time- 
dependent Ornstein-Uhlenbeck process. We show how the theory can be used to understand 
and quantify various finite size phenomena in recurrent neural networks, with and without 
detailed balance. 
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1 Introduction 



Infinite-range spin models of recurrent neural networks, with information stored in the values 
of the interaction strengths of pairs of spins, have been studied intensively with statistical me- 
chanical tools following the papers p]] and Q. The first wave of such studies involved mainly 
equilibrium analyses, and was consequently restricted to models obeying detailed balance. Away 
from the saturation regime (where small numbers of patterns are stored) such models can be 
solved with standard mean-field techniques and display standard mean-field behaviour. In con- 
trast, in the saturation regime (where an extensive number of patterns are stored) tools from 
spin-glass theory are required (replica theory), and non-trivial phases occur. The second wave of 
studies employed tools from non-equilibrium statistical mechanics. Here restriction to detailed 
balance models is irrelevant. However, in view of the highly non-trivial nature of the glassy non- 
ergodic dynamics of models close to saturation, most dynamical studies have been restricted to 
recurrent neural networks with only small numbers of patterns stored. For an overview of the 
relevant literature see textbooks such as or reviews such as Q. 

In spite of the fact that finite size effects have been reported regularly in literature, and that 
they are know to persist even for system sizes up to N ~ 10 5 ||, it appears that systematic 
studies of finite size effects in recurrent networks which go beyond pilot studies such as || have 
not yet been performed. The purpose of this paper is to carry out a comprehensive analysis 
of finite size effects (in first non-trivial order in the system size) for a reasonably general class 
of recurrent neural network models, where the interaction matrix has a separable structure. 
This class contains detailed balance models, as well as models without detailed balance. Away 
from saturation, finite size effects in these systems take the form of mainly thermal fluctuations 
of order 0(N~ 1 ^ 2 ) around mean field trajectories for dynamical order parameters, as well as 
disorder-induced 'frozen' corrections to the mean-field laws. Close to saturation even the N — > 
oo dynamics cannot be solved in explicit form (describing transients is found to necessitate 
approximations in all of the present approaches; in the path integral formalism |Q, |8) as well as 
in dynamical replica theory ||). Therefore in the latter regime the development of a finite size 
theory would be premature. 

We study the evolution of finite recurrent neural network models away from saturation and 
with Glauber- type (stochastic) neuronal dynamics. We expand the Kramers- Moyal expansion for 
the system's natural dynamic order parameters on finite time-scales, and calculate the statistical 
properties of finite size effects to first non trivial order in 1/N. The finite size effects turn out to 
be governed by a time-dependent Ornstein-Uhlenbeck process. Our theory is used to analyse the 
dependence of finite size effects on detailed balance, scaling properties of fluctuations close to 
phase boundaries, and escape processes in critical models which are driven purely by finite size 
effects. Comparison with extensive numerical simulations confirms the theoretical predictions 
in all cases. 

2 Derivation of Macroscopic Laws 
2.1 Model Definitions and Simple Relations 

We consider a system composed of a large, but finite, number N of interconnected neurons, 
modeled as Ising spins o; L 6 { — 1,1}. The vector a(t) = (o"i(t), . . . , crjv(t)) defines the state 
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of the system at time t. The dynamics of the system is defined by a master equation for the 
microscopic probability distribution pt(cr): 



^*(°") =^Z{wi{Fia)p t {Fia)-Wi{(T)p t (a)} (1) 

i 

Wi {a) = - [1 - Ui tanh(phi (a))}, (2) 

where Wi(t) defines the rate of the single-spin transitions <Ji(t) — ► — <7j(i), [3 = T~ l (the inverse 
temperature) controls the stochasticity in the dynamics, and Fi is an operator that flips the 
i— th spin, i.e. Fif(a%, . . . , a^) = /(fi, • • • ■ ■ ■ , &n)- The local field hi(cr) is given by the 
usual linear expression 

hi(a) = ^2 Jij(?j+Qi (3) 

3 

where Jij is the strength of the synaptic connection from neuron j to neuron i, and 0{ is a 
response threshold. The interactions are assumed to result from a learning process involving 
a finite number p of randomly chosen binary patterns £ M = (Ci > • • • j Civ) ^ { — 1)1}^) with 
// = l,...,p. We restrict ourselves to situations where the interactions have a separable form 



(see e.g. [p], 11, 0|) and introduce a parameter A 6 {0,1} to control whether or not self- 



interactions Ja will be allowed: 

/xi/=i k 

Given the process (jl|,§), we can define averages over the microscopic ensemble in the usual 
way, and find simple relations for the temporal derivatives of such averages: 

{f{a)) t ee 5>(<r)/(<r) ^(/(°"))t = (E^H [/(^)-/(^)])* (5) 

In particular, application to f(cr) = a k gives 

-^( a k)t = (taah(J)h k (cr))) t - (a k ) t (6) 

In order to make the transition to a macroscopic description of the process, we define the usual 
pattern overlaps. These observables (which for finite N take discrete values only) measure the 
similarity between the state of the system and each of the p stored patterns: 

1 N 

m(a) = (m 1 (a),...,m p ((T)) m^a) = —^2^i a i ( 7 ) 

i 

The probability density for the macroscopic variables m is given by: 

P t (m) = Y,Pt(<r)5[m-m{a)) (8) 
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We next define conditional, or sub-shell, averages of observables /(<x). These are averages over 
the statistical ensemble, with the microscopic probability distribution pt(&), restricted to those 
micro-states cr that obey m(cr) = m (in a distributional sense): 

/-/ u _ 12aPt(tT)S[m-m(a)]f((T) 
EcrPt(cr)5[m-m(cr)] 

Note that the ^-distribution in the definition @ allows us to replace all occurrences of m(tr) in 
/ simply by m: 

{f[ < T,m{a)})m;t = {f{cr,m})m;t (10) 



2.2 The Kramers-Moyal Expansion 

The dynamic equation for the macroscopic variables m(<r) can be obtained by making the choice 
/(er) = 5[m — m(a)] in equation (g) 

iPt{m) = ^p t {(r)^wi(a) h[m-rn{<T) + ^-a i ^ i }-5[rn-rn{cr)] 
cr i 

Inside this expression we make a Taylor expansion in powers of the vector jjij&i and write the 
result in terms of sub-shell averages (Q) : 

^<c»>=£^ £ " t Bm a '. am Mx) an 

ai £>i M i=i w =i am ^i 

)m ; t = ©(iV 1 ^) (12) 

This is the so-called Kramers-Moyal expansion, applied to the present class of models. 

Since equation ( |TT[) cannot be solved exactly, we follow the standard procedure for 'large 
systems' (see e.g. |13| ), and expand ( |Tl| ) in powers of 1/N. By keeping only the two leading 
orders, we obtain, at least on finite time-scales (i.e. on times not scaling with the system size 
N) , the N — > oo (mean-field) equations plus the leading order contribution due to the finite size 
effects Q: 

i p >W = tl^{r*™m™-A} + \ t o^{K«W®M} + ovr*) (is) 

Upon insertion of the transition rates lOj(cr) (|2|) and the local fields (|3|), written in terms of the 
overlaps @, we can work out explicitly and simplify (with e.g. fllC|)) the various terms, giving 

F« [m; t] = m,-i^ tanh /? • Am+flJ 

j 

1 Note that due to Pawula's Theorem (l^] we have but three options: 1) to retain only the O(N ) terms of (|ll]), 
(describing the infinite system), 2) to include in addition the 0(N~ 1 ) terms, or 3) to keep all remaining orders 
in N. Retaining a finite number of terms, until and including order iV~ n with n > 1 would generate solutions 
P t (m) which violate the obvious condition that they be positive definite (i.e. represent probability densities). 
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Similarly: 



+ ^E(^)m^ife'^) [l-tanh 2 /?^.Am+^]]+0(iV 



±F$[m;t] = p^^r [l-(^)m ;4 tanh/?(^.Am+^)] + 0(iV- 2 ) 



(14) 



(15) 



In the limit iV — ► oo equation (O) reduces to a Liouville equation: 







m. 



-(^tanh/3[^-Am+e])^J} 



with the deterministic solution 

P t (m) = <5[m-m*(i)] 
where we defined 



dt 



m*(t) = (£ t&nh (3 [£■ Am* (t)+ 6}) ^ e -m*(t) (16) 



1 



<s[£, 9]) ifi = lim - X; ft] £ = (£1, • • • , &) 



with ^ = (C 1 ,...,^). 

3 Description via Rescaled Variables 
3.1 Derivation of Fokker-Planck Equation 

The stochastic vector m(<r) can apparently be written as the sum of a deterministic term m*(t) 
and a fluctuating term, with the latter vanishing for N — > oo. Since for mean- field models the 
overlaps can be seen as an average over TV" independent random variables, one would expect, 
from the central limit theorem, the fluctuating term to scale as N~z. Therefore we define a new 
stochastic variable q(<r): 



q(a) = VN [m(a)-m*(t)} 



V t (q) = / dm P t (m)S 



N[m-m*(t)} 



(17) 



in which m*(t) is the solution of the deterministic equation (|i~6[). Working out the temporal 
derivative of Vt(q), with the help of the macroscopic equation (|13|), and taking the N — > oo 
limit, leads to a convenient description of the leading 0(N~z) finite size effects in terms of a 
Fokker-Planck equation for the rescaled variables q: 



m(a{t))=m*(t) + ^=q(t) + 0(^) 



(18) 
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in which the flow term is given by 

F^[q;t]=q^-f3(^-Aq) [l-tanh 2 (3[£- Am*(i)+0]j 

+ lirn^ VN I tanh [£ • Am* (t) +9})^ - i £ # tanh & ■ Am* (t) + j 

where we used fll4|), (|l6|) and (|l7|). The last term in F^[q;t] describes a 'frozen' finite size 
correction to the flow field, depending on the microscopic realization of the pattern components. 
Similarly, we can work out the diffusion matrix: 

D^[q;t] = lim -^tftf [l- {<Ti) m * m tanh^fo- Am*(t)+^]1 (19) 
where we have used {(Ti)m,t = { cr i)m*(t);t 

+ 0(N~2). According to (g) we may also use 

fo>m*(t);t = ffi(0)e-* + fds e s - t tMihf3[i i -Am*(s)+d l ]+0(^=) (20) 
We conclude that the diffusion matrix does not depend on q: 

D^{t) = - e-* Jmi o i^^>i(0)tanh/3[^-Am*(t)+^] 

- e s "*(^tanh/3[4-Am*(s)+^]tanh/3^-Am*(t)+0])^ e (21) 

The fact that the above limits exist and give a non-trivial flow-term and diffusion matrix for 
(|18|) is the a-posteriori justification of the ansatz (Rj 



We finally have to specify the pattern- and threshold statistics in order to analyze the Fokker- 
Planck equation (|T8|). We choose independently drawn pattern components £ {—1,1} (with 
equal probabilities) and independently drawn neural thresholds 9i (with probability distribution 
W(d)): 

(g[t,e]) te = 2-p £ jdewieuw] (22) 

£e{-i,lp 

which gives convenient relations like 



(f(^)9(0)} te = 2-P Yl /(€) Jd9W{9)g{9) = 

£e{-i,i}p 



The flow term F[q;t] in ( |18| ) can be written as the sum of two contributions; the first (K) 
depends on the specific microscopic realization of the pattern components, whereas the second 
(with the matrix L) depends only on the probability distribution of the pattern components: 

F^q; t) = K^t) + L^{t)q v (23) 



7 



Kfi (t) = lirn^ v^V | tanh • Am* (i) +0])^ - ^ £ tanh I 3 K< " Am * (*) + ^ } ( 24 ) 
L„ u (t) = V - [l -tanh 2 • Am* (t) +0]] >£ ^ (25) 

A 

The diffusion matrix in (|l8|) is symmetric, and can be simplified to 

D^(t) = V -e"* ton i^efC^(0)tanh/3[e r Am*(t)+^] 

iV— >oo iv — 

- /" ds e s "'(^tanh/3[^- Am* (s)+0] tanh ^[^im*(t)+fl])^ (26) 



Equation (|l~8|), with its flow term (^) which is linear in the rescaled fluctuation variables q and 
with its q- independent diffusion matrix (|26|), is called the 'linear noise' Fokker-Planck equation; 
it describes a so-called time dependent Ornstein-Uhlenbeck process (see e.g. jls]) ). 



3.2 General Solution 

The natural solution of the Ornstein-Uhlenbeck process is a Gaussian distribution: 



"■M' M »V itt3|l) exp {-![,- (*]•=-'(*)[, -(,).]} (27) 

It is fully characterized in the usual way by the time-dependent average (q)t and the time- 
dependent correlation matrix 

= {q^qy)t ~ (q^)t(qu)t (28) 



Here we denote averages over the distribution ([27D as (f(q))t = Jdq ~Pt(q)f(q)- Insertion 
of equation (|27]) as an ansatz into the Fokker-Planck equation (^) gives the following three 
necessary and sufficient conditions for (|27| ) to be a solution: 



±(q) t = -L(t)(q) t -K(t) (29) 

jS(t) = -L(t)S(t) - B{t)L\t) + 2D(t) (30) 

4logdetH(t) + 2 TrlL-DS^ 1 } = (31) 
at 

with the (symmetric) diffusion matrix (|26|), Equations (29,30) define the evolution in time of 
the moments of the distribution (|27]). Equation ([H]) is then solved automatically, which can be 
seen by combining the Wronski identity 4? logdet_B = Tr[B~ l -j^B] with equation (|30[). 

Since the differential equations (2^,3^) are linear, they can be solved using standard proce- 



dures (see e.g. [13, |15[). One defines the propagator G(t) as the (matrix) solution of 

d_ rt 
~dt 



-G{t) = -L(t)G(t), G(0) = 1 or G(t) =1- t ds L(s)G{s) (32) 
; Jo 
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in which i" denotes the unit matrix. This allows us to express the solution of (|29],[30|) in the 
following compact way: 

(q) t = G(t)(q) - G(t) fds G-\s)K(s) (33) 

J o 

S(t) = G(t)B(0)G*(t) + 2G(t) fds G- 1 (s)D(s)[G^(s)}- 1 G^(t) (34) 

Jo 



(as can be verified by insertion). Note, however, that calculating the propagator ( |32| ) can still 
be non-trivial. 

3.3 Stationary States and Detailed Balance 



For large times t — > oo the dependence of fl26| ) on the microscopic initial conditions vanishes. 
Furthermore, for macroscopic stationary states, i.e. m*{t) = m* for all t, with m* given by the 
solution of the macroscopic fixed-point equation 

m* = (£tanh/3[£ • Am*+S\)^ (35) 

we can in addition perform the time integration in (p6[). All flow- and diffusion terms in the 



Fokker-Planck equation (18) become independent of time, the convection matrix (^) can be 



expressed in terms of the diffusion matrix as 

L = I- (3D A (36) 

and our process (^) reduces to a time independent Ornstein-Uhlenbeck process, characterised 
by 

F[q] = K+[I-PDA]q = 5^-{^ u tanh 2 (3[$-Am* +6])^ 

K = VN J (£ tanh p[£ ■ Am* +6])^ tanh I 3 Ki ' Am * \ 



Since the matrix L (|36|) is stationary, the propagator (|32| ) reduces to G(t) = exp[— tL]. Whether 
or not a macroscopic stationary state m*(t) = m* will be reached will depend on the choice 
made for the matrix A. 

A sufficient condition for asymptotic stationarity is (microscopic) detailed balance, which 
states that, in addition to stationarity of the probability distribution pt(<r), there is no net 
probability current between any two configurations a and a' . For the models studied in the 
present paper this translates into symmetry of the matrix A and absence of self- interactions, 
i.e. A = 1 (apart from pathological exceptions, like systems with self-interactions only). See 
e.g. 1 16, 17]. Note, however, that our equations ( |l^Jl^ ) show that presence or absence of self- 



interactions does not yet play a role in the first two leading orders in the system size. We will 
now inspect the conditions for the Fokker-Planck equation (^) to have a stationary solution, 
and show that for this solution to obey detailed balance (i.e. for there to be no net probability 
current in qr-space) we must again require symmetry of the matrix A. Equation ( |18D can be 
written as a continuity equation for the probability density Pt(q)- 

j t P t {q) + V • J t (q) = 
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with V = (^j-, • • • , and with, in the case of macroscopic stationarity, 

J t {q) = P t (q) {DS-\q - (q) t ) - K - Lq) (37) 

/,From (29,30|) we deduce that for ( |3"3| ) to be a stationary solution, i.e. -^(q) = and = 0, 
we must require 

(q) = -L- 1 K 1 -{L3 + (L3)^ = D (38) 

For such stationary states the probability current (^7|) reduces to 

J(q) = P(q) [D3- l -L]{q + L- l K) (39) 

We conclude that detailed balance, i.e. a vanishing current, requires in addition to (|38| ) that 

LS = D (40) 

Combination of (38,40) leads to the condition DL^ = LD, which, with identity (^), translates 
into DA ] D = DAD. We now use the symmetry and non- negativity of the stationary diffusion 
matrix D, i.e. x ■ Dx = {(x ■ £) 2 [1 -tanh 2 f3[£ Am* +0]])* e > (with x ■ Dx = only for 
x = 0). We denote with {\n)} the orthogonal basis of normalised eigenvectors of D, and with 
{d n } the corresponding (positive) eigenvalues. This allows us to derive from DA^D = DAD 
that Wn,m : d n d m (n\[A^ — A]\m) = 0. This implies that A = A^ , which thus is found to be not 
only a sufficient condition, but also a necessary condition for a stationary solution of equation 
(|l8|) to obey detailed balance. 



4 Application to Associative Memories 

Our first application is an associative memory model, which generalizes the standard model 
of Q by allowing for patterns to be stored with different embedding strengths: Jy = — 
ASij] J2[j, w fi£,i£,j (with < < 1 for all fi). This model, due to IS], corresponds to the choice 
Afj, u = w^b^y in the language of @, and thus obeys detailed balance. For simplicity we choose 
zero thresholds, i.e. W{9) = 6(0). We will only study finite size corrections to the so-called 
'pure states', where m*(t) = m(t)S\^, which are the most important macroscopic solutions from 
both a thermodynamic and an information processing point of view. Without loss of generality 
we can choose m(0) > and A = 1 (as long as we refrain from ordering the embedding strengths 
with respect to magnitude), so 

m*(i) = m*(t)(^i ^t m *^ = tanh Pl w ~i m * (*)] ~ m *(*) ( 41 ) 

The above mean-field equation ( |4"l"D will always evolve towards a fixed-point, given by the solution 
of m* = tanh f3[wim*]. Above the critical temperature T c = w\ the macroscopic fixed-point is 
paramagnetic, i.e. m* = 0, below T c one finds an ordered state, i.e. m* > 0, which represents 
retrieval of pattern one. Both fixed-points, however, need not be stable against perturbations 



in the direction of non- nominated patterns [18|. We have to define initial conditions that will 



generate a pure macroscopic state, for which we choose 

Po( ff ) = IT i^+mm^l + ^-mm^-A (42) 

i 

This indeed gives m*(0) = m(0)(5 Mj i, as it should. 
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4.1 Statistics of Finite Size Effects 

The restriction to 'pure' macroscopic states simplifies our finite size analysis considerably. The 
relevant objects in the Fokker-Planck equation become 



D(t) = D(t) I 



K{t) = K(t) R, 
with the scalar functions 

K(t) = -t&nh l3[ Wl m*(t)} t^t) = l-(3 Wfl [l-tanh 2 (3[ Wl m*(t)] 

D(t) = 1 - tanhp[wim*(t)] je~*m(0) + J ds e s ~ l tanh/5[^im*(s)] 
and with the (stationary) vector R, defined as 



R u> i = lim 



#1 = 0, 

The propagator ([32] ) now becomes trivial: 



— Tee 1 



(43) 
(44) 

(45) 
(46) 



This will enable us to calculate the moments of the distribution Vt(q) at any time explicitly. 
Since ( |42| ) describes statistically independent initial components <7j, the initial distribution Vo(q) 
is Gaussian. ^From (^) it follows that (q)o = m(0)R and H(0) = [1 — m?(0)]I. The moments 



at any time t > then follow from (p3j. 34): 



{q„)t = R„ |m(0)e _ /o ds ^ + J*ds e" ^ du W tanh (3[ Wl m* (s)]j (47) 

Ep,(t) = 5^(t), E„(t) = [l-m\0)]e- 2 ti ds *M + 2 f ds e" 2 J> f " (s) D( S ) (48) 

Jo 

This gives the full distribution 



-Pt(q) 



iEj9M-{9M>t] 2 /H M (t) 



(49) 



which describes uncoupled fluctuations, together with 'frozen' finite size corrections to the over- 
laps corresponding to uncondensed patterns. The above results obviously break down when the 
propagator develops runaway solutions, which is likely to happen at phase transitions. 

4.2 Near the Ordering Transition 

We will inspect the behaviour of the finite size effects in stationary states close to the phase 
transition separating the paramagnetic from the ordered state. For stationary states, where 
m*(t) = m* for all t > (with m* given by the solution of m* = tanh. P[wim*]) , we have in the 
asymptotic region (i.e. for t — > oo) K{t) = K, iu(i) = ^ and D(t) = D, with 



K 



-m 



1^ = 1- (3 Wfl l-(m*) 2 



*\2 



D = 1 — (m*) 



11 



This gives 



rrfRft „ l-(m*) 2 

Wm/oo - x _ /? ^[ 1 _( m *)2] - ! _ ^[1.(^)2] 

In the paramagnetic state, i.e. m* = 0, we thus find 

T 

(g M )oo = 0, H M (oo) = — 

The fluctuations diverge for T j T c = max^u;^, which indeed marks the temperature where the 
paramagnetic state destabilizes, in favour of a non-trivial pure state. In a non-trivial pure state, 
i.e. for m* > 0, we find that both the asymptotic average (q^)oo and the asymptotic variance 
H^(oo) for a fluctuation direction \x diverge when £„ — > 0. This again makes sense: the condition 
ta > for all fj, > 1 is the condition for the pure state corresponding to pattern one to be 
macroscopically stable, with £„ = signaling destabilization of this pure state in favour of an 
alternative pure state fx > 1 (see [j0|] ) . Expansion of the macroscopic fixed-point equation close 
to T c , for that particular pure state which is the first to order as the temperature is lowered (i.e. 
we now assume max^ w u = w\ and T c = wi), gives 

< k = T °~ T < 1 : 0wi = 1 + k + 0(k 2 ), m* = V3k~+(D(k) 

1 c 

This allows us to expand the finite size terms in powers of the rescaled distance k from the 
critical temperature: 



(qi)oo = (^>i)oo = V 3KR " W1 + 0{K) 

Wl - Wfj, 

H 1 (oo) = ^ + 0( K °) H M>1 (oo^ 



2k w\ — 

The approach of the transition T f T c is signalled by diverging fluctuations, as it should. 

Finally we test some of the above predictions against numerical simulations. The simplest 
model of our (symmetric) class is the one where all embedding strengths are equal: = 1 for 
all [i. Here we know that at for T > 1 the system will be paramagnetic, whereas at T = 1 a 
second order thermodynamic transition occurs to a pure low-temperature state. If we denote 
the non-negative solution of the macroscopic fixed-point equation m* = tanh/3[m*] by m(T), 
with m(T > 1) = 0, we arrive at the following predictions: 

2\ /„ \2 _ T[l~m 2 (T)} 



H condensed : (q^^ = (q^)^ - (g M/oo - r _ lW(r) 

2\ _ / \2 _ T[l-m 2 (T)} 

T-l+m' 2 (T) \%loc WWoo — T-l+m?(T) 



H uncondensed : (q^ooR J = T J^+ffi m (ql x ~ ' ' "' ' ' ' ] 



In figure [l] we show these predicted equilibrium moments as functions of temperature, together 
with results from numerical simulations carried out for N = 10000 for T < .6 and N = 50000 
for T > .6. The agreement between theory and experiment is satisfactory. 
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Figure 1: Comparison between the theoretical predictions for the moments of the finite-size 
contributions to the pattern overlaps and numerical simulations, in stationary states. Left 
graph: normalised averages (qi) and (g^)^/?" 1 (pi > 1). Right graph: variances {qfyoo ~ (^)^o- 
Solid lines: theoretical predictions. Markers: simulation results, for N = 10000 for T < .6 and 
N = 50000 for T > .6 
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4.3 Zero Temperature 

For T — > (or /3 — > oo) the mean-field equation for the amplitude of pure states reduces to 
^m(t) = 1 — m(t), given our convention m(0) > 0, with solution 

m(t) = m(0)e~* + 1 - e~* (50) 

Since m(i) > for any time (i.e. we will always be away from the T = discontinuities at 
m = 0), we may deal with the non-trivial terms in our problem by using 

lim /3[1 — tanh 2 [3[w\m(t)\\ = wT 1 lim lim — — tanh[/57i>i77i] = 2w7 l lim 5(m) = 

With this identity we obtain for T = 0: 

K(t) = -1 Zp(t) = l D(t) = e~*[l - m(0)] 

which, in turn, gives the simple propagator Cr(i) = / e~* and the following moments for the 
finite size corrections: 

H condensed: (g M ) t = (q^t - (q^) 2 t = [l-m(0)] e -* {2 - [l-m(0)] e -*} 

/i uncondensed : {q^tR^ 1 = m(t) (g 2 ) t - = [l-m(0)]e _t {2 - [l-m(0)]e _t } 

with m(t) given in (|50|). The 'frozen' correction to the mean-field laws, i.e. the term (q^t, 
increases in absolute strength as time progresses. The fluctuations, which at T = have their 
origin purely in the randomness of the order of the single-spin updates, decrease to zero expo- 
nentially. In figure ^ we show these predicted zero temperature moments as functions of time, 
together with results from numerical simulations carried out for N = 5000. Again the agreement 
between theory and experiment is quite satisfactory. 



5 Application to Non-Equilibrium Models 

In this section we will apply our theory to non-symmetric systems, i.e. A^ v ^ A UfJi , for which 
detailed balance does not hold. We will restrict ourselves to the case p = 2 and W(9) = 5(6) for 
simplicity; extensions to larger values of p and/or non-zero external fields are straightforward 
and are not likely to generate new physics. We define an initial microscopic distribution with 
statistically independent spins, in order to guarantee a Gaussian shape for Vo(q), given by 

Po(o-) = II {^[l-mi(0)-m 2 (0)] +mi(0)^. 4 +m 2 (0)5 (Ti ^| (51) 

i 

Obviously, we have to restrict ourselves to the physical region, defined by the two conditions 
| mi (0) + "72(0)1 < 1 and |mi(0) — m2(0)| < 1. Our definition generates the required initial 
macroscopic observables, limjv-»oo("fy(f))o = "fy(0) (/J- = 1,2), and gives the following initial 
moments for the finite size variables: 

o ( m 2 (0) \ _( 1 - m 2 (0) -2m 1 (0)m 2 (0) \ 

\ mi (0) ) " W ^-2177,1(0)7722(0) l-m 2 (0) J K > 

with m(0) = (mi(0), m 2 (0)) and with R = lim^oo ^= Ei 
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Figure 2: Comparison between the theoretical predictions for the moments of the finite-size con- 
tributions to the pattern overlaps and numerical simulations, at zero temperature, as functions 
of time. Upper graph: normalised averages (q\) and (q^Rj^ 1 (fi > 1). Lower graph: variances 
(Qu) ~ (<27i) 2 - Solid lines: theoretical predictions. Markers: simulation results, for N = 5000. 



5.1 Non-Equilibrium Stationary States 

In this subsection we will study the class of networks where the matrix A has the form 



A = 



1 



with e > 0. These systems obey detailed balance only for e = 0. The mean-field equations, 
describing the overlap evolution in the N — > oo limit, are given by 



d_ 

It 



m - - - m - 4-»« + iM (I) + ^K+«-iM (_\) (53) 



In the low temperature regime these equations have two types of fixed-points. Firstly, for T < 1 
one finds the two non-trivial fixed-points m* = ±(m*, 0), related only to pattern one, where m* 
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is the positive solution of m* = tanh[/?m*]. The second set of fixed-points are related to pattern 
two. For e = 0, these are simply the pure states m* = =t(0, m*). For e 7^ 0, however, they move 
away from the m\ = axis in the (m*, mJj) plane, towards the pattern one pure states, until the 
fixed-points merge pair-wise. The magnitude of this displacement increases with e; the rate of 
increase being low for small e, but large close to the point where the fixed-points merge. The 
value of e where merger occurs decreases with increasing temperature. In all cases the second 
class of fixed-points is found to have disappeared for e = 1. 

We will study the finite size effects for the pure fixed-point m* = (m* , 0) , and their de- 
pendence on the parameter e which can be interpreted as measuring the degree of violation of 
detailed balance. To assess the macroscopic stability of this pure state we study the effect of 
perturbations: m*(t) = (m*,0) + (5i(t),5 2 {t)), with \6\(t)\ <C 1 and | <5 2 (* ) | <C 1. Linearisation 
of the mean-field laws gives 




0[1 - {m*f]-l I + (3e[l-{m 



,*\2i 



1 





+ ... 



with the solution 



( _ p [/3[i-(m*)2]-ilt ( 5 1 (O) + 0et[l - (m*) 2 ]<5 2 (0)\ 
\h(t)J { fc(0) ) 

Therefore the pure states m* = (m*,0) are (globally) stable if and only if f3[l — (m*) 2 ] — 1 < 0. 
This condition is met by the solution of m* = tanh[/3m*] as soon as it is non-zero, i.e. for all 
T < 1. 

For finite N the mean-field picture will be modified by finite size effects. In the pure fixed- 
point m* = (m*,0), obtained following a pure initialisation m*(0) = (m*(0),0), the stationary 
Ornstein-Uhlenbeck process is characterised by 



L = I - P[l-(m*Y}A 



D 



1 - (m*) 2 



K 



-m*R 



We work out the propagator G(t) (|32| ) by splitting the matrix L into two commuting parts, 
such that G(t) = exp[— tL] factorises into two separate matrix exponentiations: 



G(t) = exp <^ -t[l-p[l-(m*y]]I + e(3t[l 



-*W[1>*) 2 1] 



l + ept[l- 




The condition for the propagator to be well-behaved is identical to the condition for the fixed- 
point under consideration to be macroscopically stable: /3[l—(m* 
we arrive at the moments of the distribution Vt(q)'- 



< 1. By working out (33|.34|) 



(q)t = m*(0)Re 



tL 



+ m*RL- L [I 



,-tL 
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,-tL ( 1 " ™ 2 (0) 



^-2/711(0)7712(0) l-m 2 (0) 
The limits t — > 00 are given by: 



2m 1 (0)m 2 (0) 1 | e . (L t +2 



*\2 



l-(m*) 



ds e 



{s-t)L e {s-t)V 



(9)00 = 777* R J [l-/3[l-(777*) 2 ]]/ - e/?[l-(T77*) 2 ] ' " ' 








777* R I e/3[l-(777*) 2 ] 

[l-/3[l-( m *)2]]2 1 -(7T7*) 2 ] 



(54) 



i(oo) 



2 


l-(777*) 2 " 


lim / 









ds e WWWl] / _ e/3 ( s _t)[i_( m *)2] 



+ e 2 /3 2 ( S -t)^[l-(777*) 



*\2i2 




T + \e 2 f3 2 H 2 (T) \ef3H{Ty 



(55) 



in which 



*\2 



fT(T) 



1-(T77*) 



777 



*\21 



H(0) = 0, H(l) 



DC 



Note that i/(T > 1) = T/(T— 1). Apart from inducing a non-zero stationary correction (qi)oo 
to the overlap with pattern one, violation of detailed balance (i.e. having e > rather than 
e = 0) leads to an increase in the fluctuations of the non-trivial overlap, and a coupling of the 
fluctuations in the q\ and q<i directions, which in the case of detailed balance would have been 
statistically independent. 

We can appreciate most clearly the effects of the correlations in the fluctuations by examining 
the curl of the probability current J(g) in g-space. In doing so we can use the stationarity 
condition (]38| ) for the correlation matrix, i.e. ^[£H + (LH)^] = D (which allows us to put 
x ■ [D — LS]x = for each x £ 3? 2 ), as well as the symmetry of both the correlation matrix H 
and its inverse. For stationary states the probability current (|37]), which must be divergence- free, 
reduces to 



J(q)=V(q) 



DS 



(q- (<?)oc) 



Its curl is found to be 
VxJ(q)=V(q){V: 



DS 



-1 



g-(q)oo)-(q-(q)oo)-(3- 1 )t[D-LH]H- 1 (Q-(q)oo)} 

= V(q) VxfpS- 1 -!]^-^)} 

= -e(3[l-(m*) 2 ] V(q) 

Violation of detailed balance, due to the asymmetry of the matrix A for e / 0, produces a 
stationary rotational current in the space of the finite size variables q. The magnitude of this 



*\2] 



current is proportional to the magnitude of the parameter e. For T < 1 the prefactor j3[l— (m 
is a monotonically increasing function of temperature, starting at zero for T = and approaching 
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one for T — ► 1. The rotational current persists above the critical temperature, where we find 
VxJ(q) = -e(3V(q). 

Finally we test the predictions (54,55) for the moments of Voo(q) against numerical simu- 
lations. In figure [3| we show these predicted equilibrium moments as functions of temperature, 
together with results from numerical simulations carried out for N = 50000 and e £ {0, 0.2, 0.5}. 
The present non-equilibrium model is found to require larger system sizes for our fluctuation 
theory to hold (i.e. neglected higher orders in N~z are more prominent) than the equilibrium 
models studied in earlier sections. In addition the time required for transient effects to have died 
out is longer. This explains why the agreement between theory and experiment, as observed in 
figure |3], although still reasonable, is less than that in observed in our previous experiments. 
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Figure 3: Comparsion between the theoretical predictions for the first order moments of the 
finite-size contributions to the pattern overlaps and numerical simulations, in stationary states, 
as functions of temperature and for e 6 {0, 0.2, 0.5}. Left graph: normalised average (</i)oo-R -1 ■ 
Right graph: normalised average (q^oo-R -1 - Solid lines: theoretical predictions. Markers: 
simulation results, for N = 50000. 
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5.2 Escape Times Controlled by System Size 

As a final application of our finite size theory we turn to a model which at T = is exactly 
critical, in the sense that the mean-field flow is such that the asymptotic value m*(oo) of 
the overlap vector is exactly on a regional boundary in m-space which separates qualitatively 
different macroscopic flow domains: 



V 1 1 / 

As a result one finds in this particular system that the relevant escape times, which dictate 
whether and when the state vector can leave a given domain, are controlled entirely by the size 
N of the system. 

At non-zero noise levels < T < 1 the solutions of the mean- field equations for m* , describ- 
ing the overlap evolution in the ./V — ► oo limit, show evolution into a stable limit-cycle [ 10 1 . For 
T = the mean-field equations reduce to 



d_ 
dt' 



1 



-m 



sgn[m 1 




+ - sgnfm^] 



-1 



m 



giving 



region I, m*(0) > 0, 777,2 (0) > 

region II, m*(0) < 0, 7772 (0) > 

region III, m*(0) < 0, m|(0) < 

region IV, m*(0) > 0, 7773 (0) < 



m*(t) = m*(0)e~* + (0, 1)[1 - e _t ] 
m*(t) = m*(0)e~* + (-1,0) [1 - e - *] 
m*(t) = m*(0)e _t + (0,-l)[l - e~ l \ 
m*(t) = m*(0)e _t + (1,0)[1 - e _t ] 



(56) 



There are four qualitatively different macroscopic flow regions, separated by the two lines = 
and 7772 = 0. In all four cases the macroscopic flow is directed towards a state which is exactly 
at the regional boundary, such that the asymptotics of the system (i.e. whether or not the state 
vector will escape to another region) will be decided purely by the finite size effects. 

Due to the overall symmetry of our models with respect to the transformation Vcr : Pt(&) — ► 
Pt(-cr) (at least: in the absence of external fields), the properties of regions I and III and of 
regions II and IV are pair-wise identical. Furthermore, the properties of region II follow from 
those of region I via the transformation £ x — ► — £ , and the properties of region IV follow from 
those of region III via the transformation £ x — ► — £ . This implies that without loss of generality 
we can restrict our quantitative analysis to region I. We choose T = and the initial state 
m*(0) = (tt7*(0),0), with < 777*(0) < 1 (i.e. in region I). The relevant quantities in the 
Fokker-Planck equation fll8|) are then given by 



Kit) 



Lit) 



Dit) 





We find the simple propagator G(t) = e t I. Note that working out the relevant moments 
(33|,34) of Vt(q) for the present model, following the initial conditions (p2;), gives 



{q)t = e-*i2m*(0) 



+ [1 
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m = e 



_ 2t l-(m*(0)f 






1 - (m*(0)) 2 



+ 2e~*[l-e 



{ 1 -m*(0) 
l-m*(0) 1 



We want to calculate the probability that at time t the system will have escaped from region I 
to region II. To this aim we first define 



n t (M) = Prob[mi(cr(i)) < M] = ^p 4 (<x)0[M - mi(ff)] 



(57) 



Note that in both regions I and II we have m2(<x) > (in fact the escape process I— TI happens 
close to m2 (cr) = !)• The time derivative of IT(M) follows from (||), which for T = reduces to 



-n t (M) = (\ M - - J2 sgn \M-m 2 {a) + £$[M+m 2 (<r)}\ \ S[M - mi ( ff )]) t + ©(A^ 1 ) 



A/ 



(<{ M + 6[-M] - -^6[M] \ 6[M - mi(ff)]) ( + 0(N 



r-l'* 



with the usual definition R = -^J^iCl^i- K follows that ^II t (M) is discontinuous at M = 0: 



lim —U t (M) 



R 



(8[mx(<j)])t + 0(N- x ) 



lim -n t (M) = (^KMDt + or 1 ) 

The escape process requires times sufficiently large to allow finite size effects to come into play, 
i.e. e~ l = 0(N~2). We are thus led to the introduction of the new time variable^]: 7 = e t /^/N. 
For such times the average (5[mi((j)])( can be written as 

(*[mi((7)]>t = Jdq P t (q)S[m*(p)e- i + -±=q x + ©(A^ 1 )] 



/Sii(t) 



In terms of the new time variable we write IT(M) = P 7 (M); using = j-4- we then arrive at 



lim ^-PJM) 

M-*o + 07 1 

d 



R 



/3n(t) 



11 



N 



lim —PJM) = - 



<9i>t + ^+0(JV-2)l 2 /H n (t) 



+ 0(N- 1 ) 



0(N C 



Note that on the 7 time-scales ((71)4 = R + 0(iV 2) and Hn(t) = 0(iV 2). Consequently 

m*(0)" 



7 



7 - 
m*(0) 

7 



+ 0(JV~3) 

+ O(iV ) 
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-2 2 

U (m* 2 (0)/|R|) 

Figure 4: Comparison between the theoretical prediction (j5q) for the escape time from the 
initial region and numerical simulations. Solid lines: leading two orders according to the theory. 
Dashed lines: indication of the potential magnitude of subsequent (neglected) orders 0(N~s). 
Markers: simulation results. 



Since Po(0±) = (for we start with mf(0) > 0) we are led to the following predictions. If 
R > the state vector never escapes region I; if R < 0, on the other hand, the state vector will 
ultimately escape. Once past the regional boundary, the state vector cannot return due to the 
boost described by P 7 (0_). Integration over 7 gives the explicit form 



PJ0 + ) = -R — 5[R + z\ + 0{N~2 

Jm«(0)/7 Z 



-R}6 



m*(0) 



R 



+ 0(N-2) 



So, provided R < 0, the state vector leaves region I precisely when 7 = m* (0)/\R\. Translation 

2 Here there could be a potential conflict with the assumptions of the theory. However, inspection shows 
that our derivation of the Fokker-Planck equation as the correct description of the leading order finite size effects 
requires limjv^oo t/s/N = 0, which means that the theory still applies if t = C(log(iV)). 
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back into the original time variable gives the following escape time: 



t, 



esc — 



- log N + log 



"m*(0)' 
\R\ 



+ o 



( 



1 



) 



(58) 



If we use the simple transformations that relate the properties of the four regions, we arrive at 
the following picture. If R < the system will be able to make the transitions I— >II and III— ► IV, 
but never the transitions II— >III or IV— >I. For R > 0, on the other hand, the transitions II— >III 
and IV— >I will be observed, but never the transitions I— >II or III^IV. When R = 0, the case 
where the two patterns are orthogonal in the first two leading orders in N, the escape properties 
will be controlled by the 0{N^ 1 ) finite size effects. 

Finally we compare the prediction (^) for the escape time with the results of numerical 
simulations. Figure ||] shows the average escape time as a function of log[m*(0)/|i?|]. The 
agreement between theory and experiments is quite satisfactory. 

6 Discussion 

We have performed a systematic study of finite size effects in separable recurrent neural network 
models away from saturation. Since our approach is based on analysis of the dynamics, our 
results apply to models with detailed balance (i.e. with symmetric synaptic interactions) and to 
models without detailed balance (with non-symmetric synaptic interactions). In leading order 
in the system size (iV - 2) the finite size effects turn out to be governed by a time-dependent 
Ornstein-Uhlenbeck process, and their time-dependent probability density can be calculated in 
explicit form. The leading order finite size effects are found to come in two distinct forms: 
they show up as 'frozen' corrections to the mean field laws (dependent on the details of the 
correlations between the randomly drawn stored patterns) and as fluctuations, which have their 
origin in thermal noise in the local field alignment as well as the randomness in the selection of 
the neuron to be updated. 

We use our theory to work out several specific but characteristic examples, including sym- 
metric attr actor neural network models, used as associative memories, and non-equilibrium 
models (with non-symmetric interactions). For detailed balance models we quantify within our 
fluctuation theory the familiar features of equilibrium statistical mechanics, e.g. diverging fluc- 
tuations near phase transitions and absent probability currents in the stationary state. For 
non-equilibrium models, in contrast, we find persistent rotational currents in the stationary 
state. One of our non-equilibrium examples involves the calculation of escape times which are 
purely controlled by finite size effects, which is a nice example of a problem where the finite size 
effects are significantly more than simply a correction to the corresponding result for an infi- 
nite system. More extensive applications will be published in |l9p . Comparison with extensive 
numerical simulations confirms the theoretical predictions in all cases. 
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